
Bridging Informatics and Earth Science: a 
Look at Gregory Leptoukh's Contributions 


Past, present (and future) 




Outli 


1. Early Contributions 

2. Recent Work 

3. Unfinished business... 





Science background... 


• Tbilisi State University 

- 1975: M.S. in Theoretical Physics 

- 1985: Ph. D. in Cosmic Ray Physics* 

- Monte-Carlo simulation of cosmic 
ray propagation 

• 1994: North Carolina State 

- Molecular dynamics computer simulation 

• 1997: NASA, Goddard Distributed Active Archive 
Center 

- Data browser for SeaWiFS 



includes work at Moscow State University and Moscow Institute of Theoretical and 
Experimental Physics 


In the beginning, size was the issue... 


Volume Requirements 
40 GB/day (!) 
vs. 

Distribution Capacity 


Approach: Reduce volume 
of useless bits flowing out 







Greg was always out front, informatics-wise... 



Email from Greg, circa 1998 


3V One issue that I'd rather mer 

n-f n lnun ‘Viihhpr hnnH" ‘ /eral 

CM problems: 

a. Stability of Java 

b. Compilers on all baselines (cost?) 

c. Change Makefile 

d. Dir structure for classes 

e. Classes library 

f. Prologues 

g. Proliferation of different flavors 

h. Modularization 


The Other Data Problem: Usability 


• Data in Hierarchical Data Format (HDF) 

- Also with an HDF-EOS layer on top 

• Required use of an API (C or Fortran) to read 
data 

• Terminology 

- Data variable names 

- Stride, offset, grid, swath, fill value, SDS, ... 


Leptoukh - Kaufman Collaboration 



Yoram Kaufman 
1948 - 2006 


Yoram wanted to explore MODIS 
Aerosol data without having to 
download data or write code 

Kick-started the MODIS Online 
Visualization and Analysis 
System (MOVAS) 

- Seed funding 

- Initial requirements 

- Became Giovanni 



The Giovanni Way 


Pre-Science 


Find data 


Retrieve high volume data^^. 

Learn formats + develop 
readers 

i 

Extract parameters 
Perform spatial + other 
subsetting 
a Identify quality + other 


flags and constraints 
Perform filtering/masking 

0 Develop analysis + 
nffc visualization 

Accept/discard/get more 

data 


DO SCIENCE 


Exploration 
nitial Analysis 
Use the best data for i 
the final analysis 

Derive conclusions ' 
m Write the paper 1 
I Submit the paper • 
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Retrieve high volume data 

Learn formats + develop 
readers 

Extract parameters I 
Perform spatial + other 
subsetting 
Identify quality + other 
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Perform filtering/masking j 

Develop analysis + 
visualization 

Accept/discard/get more 

data 


-based Services: 


DO SCIENCE 
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The Giovanni Way: 


Minutes 

Days for 
exploration 

Use the best data for 
the final analysis 

Derive conclusions 
Write the paper 
Submit the paper 



DO 

SCIENCE 


/ Web-based tools like Giovanni let 
scientists shrink time needed for pre- 
science preliminary tasks: 
data discovery \ access , manipulation , 
visualization , and basic statistical analysis. 


Oct J Scientists have more time to do science! 







User-Driven Development 





User-Driven Development 
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User-Driven Development 





User-Driven Development 





Some Giovanni Features 


Visualization & Analysis 

• Time-averaged map 

• Area-averaged Time series 

• Hovmoller 

• Animation 

• Vertical profile 

• Vertical cross-section 

• Climatology + Anomaly 


Comparison Features 

• Scatterplot + linear regresion 

• Difference Map 

• Difference Time Series 

• Correlation Map 

• Map Overlays 

Other Features 

• Download data: netCDF, ASCII 

• Create KML file 

• Show lineage 




Basic Visualization: Time-averaged Maps 



Dust Storm, 30 Apr 2003 


[unitkss] (30Apr2003) 

Terra Aero sal_Op tical_Dep th_a t_Q , 55_U icron_(Day time) 







Exploring in time: Area-Averaged Time Series 



Dust Storm, 30 Apr 2003 


[unitless] (30Apr2003) 

Terra Aero sal_Op tfcal_Dep th_a t_Q , 55_W icron_(Day time) 



For area (Lat: 11.0N-20.0N, Lon: 24W-14.5W) 



0 4 i i i i i i 

1 FES 16FEB 1 MAR 16MAR 1APR 16APR 1 MAY 

2003 Tima 






Multi-sensor Analysis: X-Y Scatterplot 


How well do 
they correlate? 


AirNow PM 2.5 vs. MODIS Terra Aerosol Optical Depth 
01 Aug 2007 to 05 Aug 2007 


Scatter Plot 

Time: 01 Aug2007-05Aug2007 Area: (35N-45N, S3W-72W) 


Y = 23.8B20X + 12.8313 
Correlation: R = G.C10C 
RMS Error = 9.3055 


count =451 
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Feature: Download Data 


• Giovanni as 
workflow tool 

• DIY 
publication- 
ready figures 



Visualization Results 


Download Data Product Lineage 


Acknowledgment Policy 


Download source data products and data products derived from Giovanni processing stages. For simplicity purposes, 
only the initial retrieval and final rendering phases are currently accessible for downloading. Supported download formats 
are HDF, NetCDF(NCD), ASCII, and KMZ (ASCII is available only when the array size is within about half-million points). 
To download multiple files at once, select the desired files (from any section) by clicking on their associated 
checkboxes, and then click 'Download in Batch 1 . Note: that Wa 1 means that a file size or other column value is not 
available:; 'saa 1 means that a file is exactly the same as the previous one in the list. Also, not all services and data 
products support all download file formats. 


Initial Data Retrieval 


Download Em Batch 


Download Files 
□ HDF □ NCD 0 ASC 


Data Product 


2008-11- 

UTOOiOOOQZ 


MOD C 8_D 3.0 51 fO pitica l_Dcpth_l_a nd_And_Ocean_Mean J 


1 GOBBI 


2008- 1 1 - 
15T00:00:00Z 


MOD C8_D 3.0 51 (O ptical_Depth_La nd_And_Qcean_Mea n ) 


wnioad in Batch 


2008-11- 

16T00:00:00Z 


MOD G8_D 3.051 (O ptical_Depth_La nd_And_Qcean_Mea n ) 


2008-11- 

17T00:00:t 


Download Files 


M D DG8_D 3.051 [O ptba l_Dopth_Lo nd_And_Gcea n_Moa n ) 
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18T00:0O:O 


M D D08_D 3.051 [O ptical_Depth_Land_And_Ocean_Mea n ) 


Two Dimensional Map Plot 

Input Files 


2008-01- 

OITOOiOOOOZ 


G3CorrO (g3_corrclatbn_0) 


2008-01 - 
01T00:00:00Z 


G 3CorrO (g3_CGrrelatbn_Q_cou nt) 


Output Files 


Correlation . MO D0fl_D3.051 _A. M YDOB_D3.051 _B.2008-01 -01 .0001 .gif 





Feature: DIY OMI Gridded Products 



U V Aerosol Index 

[DMTQ3G,QQ3) 


Aerosol Absorption Optical Depth 

(DMAERUVG.QQ3) 


Aerosol Absorption Optical Depth 

(DMAERUVG.0Q3) 

Return to plot 


U V Aerosol Index 

(OMTO3G.003) 
Return to plot 
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Set filter values for the 
named OMI parameter 


Set filter values for the 
named OMI parameter 



Set parameter preference 
values 


Set parameter preference 
values 



Feature: Lineage 



Visualization Results Download Data Product Lineage A 

Browse the processing details of the iatlonplot corr.xmi visualization service. 

Data Fetching 

Fetched datafile(s) using spatial constraints of South: -90 North: 90 East: 180 West: -180 and te 
Aerosol Optical Depth at 550 nm from MYDQ8_D 8.051 
Aerosol Optical Depth at 550 nm from MODQ8_D5.051 

Preprocessor 

The original data files are reformatted to HDF*4 format. Scaling factors are applied, and. in some 

Grads Reg ridding 

The following are currently the defaults in the regridder, in order of priority: If all datasets have thi 

Grid Subsetter 

Extracted spatial subset of each parameter in previous step using spatial constraint of South: -0‘S 

Anomaly 

Anomalies are computed as the differences between a parameter and a selected climatology ovi 

Dimension Collapse 

Averaged parameter(s) over the selected spstial area of South: -90 North: 98 East: 180 West: -1 

Time Stitch 

This stop constructs a timeseries for a parameter over the entire temopral range of the selection 

Correlation Map 

Calculated correlation coefficients at each grid point. 

Two Dimensional Map Plot 

Generated image(s) with options: 

Map Projection = latbn 



Standards Support 






:p/gome_omi_comparison_beta.php 


Difference Map 


This map shows the difference 03 GOME-2 (DLR)- OMI (TOMS-like.NASA) 


Date : 2009 : February 


Web Portal 


Giovanni Output 

- Download Formats 

• Keyhole Markup Language 

• netCDF/CFl 

- OGC Service Formats 

• Web Map Service 

• Web Coverage Service 

Giovanni Input 

- Web Coverage Service 

- OPeNDAP 


WMS 




Giovanni Impact 


• Science Research 


• Applications: 

- Air Quality 

- Agriculture 

- Water Resources 


Publications 



2004 2005 2006 2007 2008 2009 2010 2011 2012 


• Education: 

Data-enhanced Investigations for Climate Change Education 
(DICCE) 


Informatics: ...coming up next 


Giovanni as Informatics Laboratory 



• Does it make it "too easy" to get results? 

- Spurious comparisons in multi-sensor analysis? 

• Motivates attention to documentation 

- Provenance and citation of dynamic results 

- Documentation for variables 




Recent Work 



• Multi-Sensor Analysis 

• Community Work 

• Collaboration Enablement 


Multi-Sensor Analysis 


• Multi-sensor Data Synergy Advisor (MDSA) 

• AeroStat 


MDSA Problem Statement 


Same measurement 
(Aerosol Optical Depth) 
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Different results - why? 


Different provenance 




MDSA & Semantic Web Adoption 


\ 


N^SA 


DataPlot 


f ■> 

Qua li ty E vid enceFu n ctio n 
\ _ J 


[ Measure] 


f 

Da ta Re presentatio n 
L J 


‘are. 


t t 

hasMeasure hasRepresentation 


Data Figure 


I 


computedByFunction 


QualityEvidence 



t 


e vid enceOfl ndicator a sse rti on Ba sed 0 n E v i de n ce 


evidenceDescribes 

QualityAspect 


I 


d eseri b ed B y Qu a lity E vid ence 


s' \ 

Qua lit vindicator 
\ J 


f N 

Qua lity Assert ion 
k J 

4 — hasQu a lity Assert ion 

\ 

Data Entity 

% J 


i nd ica torOf Qua KtyAspect 


a sse rt i on Abo utQ u a I i ty Aspect 


hasQu a I ity Exp ress i on 


1 


^ QualityAspect 


Qu a lity Exp ress ion 


Collaborators @ RPI: Peter Fox, Stephan Zednik, Patrick West, + students 









MDSA provides users with key info about similarities ( N %/ 
and known issues when comparing or merging datasets 


MDSA Similarity Report 


About your selected parameters: 



MODIS 

MODIS 


Aqua 

Terra 


13:30:00 

10:30:00 


ascending 

descending 





Giovanni operations 

M ■ Time Averaging * Time Averaging 


* Two-Dimensional Map Plot ■ Two-Dimensional Map Plot 




MDSA provides users with key info about similarities (j%>) 
and known issues when comparing or merging datasets 
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MDSA provides users with key info about similarities ( N %/ 
and known issues when comparing or merging datasets 
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Maximum time difference between Aqua and Terra observations 




MDSA provides users with key info about similarities (*?§$ 
and known issues when comparing or merging datasets 


MDSA Similarity Report 




Abe 


MDSA Known Issues Report 
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Known issues: 


The difference of EQCT and Day Time Node, modulated by data-day definition, caused the included overpass time difference, which makes 
the artifact difference. See sample images: 
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Aerostat Goals 


• Compare and combine aerosol data 

• Compare aerosol satellite with ground truth 

• Merge aerosol measurements from multiple 
instruments 

• Adjust for inter-instrument biases when merging 


Collaborators: C. Ichoku, R.Kahn, L. Remer, R. Levy, et al. @GSFC; David Lary, et al. @ UT-Dallas 




Aerostat System 







AOD at Capo Verde (Lat= 16.733 Lcm=-22.935 Alt=60m 
01 Jan 2003 to 31 Det 2003 


AOD at Capo_Verde (Lat=l6.733 Lon=-22.935 Alt=60m) 

01 Jan 2003 to 31 Dec 2003 
| ... 1 | | 


U 2.4 


Quality-filtered 


Z0 


0.8 


0.4 


_ t , ^ * i 1 1 1 i 1 1 . r 

0.4 0.8 1,2 1.6 2.0 

AERONET L2.2 Interpolated AOD 550nm Mean 

MYDQ4,051 AOD Dark Target Ocean 550nm Ue. 
y =0961 ’x+0. 059, RMS =0.091 ,R 2 =0.397,N=73 
MOD04.051 AOD Dark Target Ocean 550nm Me 
y = 1 . 059* X +0, 02 , R MS=0 .09, R s =0 .965 , N=73 
MISR.022 AOD 558nm Mean 


AOD at CapoJVerde (Lat= 16.733 Lon=-22.935 Alt=60m) 
01 Jan 2003 to 31 Dec 2003 

i i ft ... I ■■ i l| ... | I 


=0.999 ± x+0.045 1 RMS=0.076 t R= 0.863, N= 1 5 


l 1 1 1 l 1 1 1 l 1 1 1 l 

0.0 0.4 0.8 1,2 1.6 2.0 2.4 

AERONET L2.2 Iniaipolated AOD 550nm Mean (nval >= 2) 

MVD04.051 AOD Dark Target Ocean SSOnm Mean (iwal >= 5,QAavg-o >= 1) 
y-0.901 V+Q.C43,RMS-0.(>47,R a -0.9B4 1 N-23 

MOD04.051 AOD Darfc Target Ocean 550nm Mean (aval >= S.QAavg-o >■ 1) 

— y= 1 .069 ‘k+0.005,RMS=0. 1 03^^0.945^=35 

- MISR.022 AOD SS&nm Mean (nval >= 2,GAto*=1 ) 

— y= 1 .029 V+D.G29 1 RMS=0.063 i R ; '=0.91 5 ( N=1 1 
1:1 


QC+Bias 

Adjustment 


0.0 0.4 0.8 1,2 1.6 

AERONET L2.2 Interpolated AOD 550nm 

MISR.0K AOD S56nm Mean (rwal >= 2,GAb<=1 ) 
y=1 .029^.029^1^3=0.063. ^=0.915^=1 1 
MVD04.051 AOD Dark Target Ocean SSOnm Mean (rival 
y=0.92’x+t>.009 1 RMS=0.04, ^=0.969, N=23 
MQDQ4.C51 AOD Dark Target Ocean 550nm Mean {rival 
y=0.91 1 *JoO,RMS=0.t>e4 1 Ff =0.949, N=35 
1:1 


2.0 2.4 

Mean (nval >= 2 ) 

5,QAavg-o >- i . BIAS_NN) 
5,QAgvg-o >■ 1, BIAS.NN] 





Aerostat Problem Statement 



MODIS Dark Target 


MODIS Deep Blue 



M0D08 D3.051 Aerosol Optical Depth at 550 nm [uniUess] 
(lt>Mar2007) 


MOD08_D 3.051 Deep Blue AOI^Qt^SSC^ nr(j (QA-w, Land only) [uni Hess] 


nm (Green Bond) [unitless] 


MIL3DAE.004 Aerosol Optical 
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Merged Maps: 1 Mar 2003 



AOD Giovanni 0.50x0,50 deg for 01 Mar 2003 
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Collaborations in Informatics 


• Community Giovanni Portals 

— Northern Eurasian Earth Science Partnership Initiative 
(NEESPI) 

• Community Participation 

- Earth and Space Science Informatics (ESSI) 

- Earth Science Information Partners (ESIP) 

- NASA Earth Science Data Systems Working Group 
(ESDSWG) 

- GEWEX Aerosol Panel 

• Mentor for interns and fellows 

• Collaboration Features 

- gSocial 


Collaborative Annotation with gSocial* 


• Results 

- Annotation of results graphics 

- Discussion of results 

- Reproducing results 

• Designed for AeroStat, but... 

• ...Can be integrated with other REST based 
services 


*lmplemented by Daniel DaSilva, NASA/GSFC 


AEROMET 12 Interpolaled AOD 866 





Discussing the Result 
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feird dual correlation 


Appears in NON-seasonal graph; does not appear for seasonal plot 




Load this Result and Modify Criteria 
Show/Hide criteria 


Weird dual correlation 


Appears in NON-seasonal graph; does not appear for seasonal yplot 


Load this Result and Modify Criteria 
Show/ Hide criteria 


Add new comment ® ShareThis 


Mon, 11/20/2011 - 15:30 - leptoukh 


The two distinct prongs reflect two different aerosol regimes in this region. The slope is 
affected by the difference in scattering properties of aerosol of different size at different 


Discussing the Result 



Hie two distinct prongs reflect two different aerosol regimes in this region. The slope is 
affected by the difference in scattering properties of aerosol of different size at differe 


Appears in N 0 N -seasonal grap 


Load this Result and Modify Criteria 
Show/ Hide criteria 


Add new comment 


* ShareThis 


on, 11/20/2011 - 15:30 - leptoukh 

The two distinct prongs reflect two different aerosol regimes in this region. The slope is 
affected by the difference in scattering properties of aerosol of different size 


Reproducing the Result 
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Load this Result and Modify Criteria 
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Mon, 11/20/2011 - 15:30 - leptoukh 


The two distinct prongs reflect two different aerosol regimes in this region. The slope is 
affected by the difference in scattering properties of aerosol of different size at different 


Unfinished Business... 



• Underway 

- Data Quality 

- Giovanni-4: Faster, better, cheaper 

• Still nascent... 

- myGiovanni 


Data Quality 


• Characterizing Quality 

- Variations in bias 

- Quality needs for different user types 

- Additional quality dimensions 

• Collaborations on Quality 

- QA4EO 

- ESIP Information Quality Cluster 

• Now chaired by Brent Maddux (U. Wisconsin) 


Characterizing Quality with Quality Labels 


[^Quality Tacts Report 

/ V 

4- O it ® localhost: 80 80/ mdsa/q uality/ report/0 Iffc5bfba3 3e71 39ffbd4b7185f£ 


Quality Facts 

Product' MODIS Terra Daily Collection 5.1, Aerosol Optical Depth at 550 nm 

Accuracy (vs Aero net} 

45% within expected error (Africa above Equator) 1 

66% within expected error (East Asia mid-latitudes) 1 

76% within expected error (Europe - Mediterranean) 1 

slope of linear regression fit (MODIS vs Aeronet) - 0.03 (Global) 1 

67% within expected error (Global land^nly) 1 

64% within expected error (Global, ocean^only) 2 

61% within expected error (Indian Subcontinent) 1 

52% within expected error (Peninsular Southeast Asia) 1 

bad compliance 
good compliance 
very good compliance 
low bias underestimation 
good compliance 
good compliance 
very good compliance 
marginal compliance 

Measurement Characteristics 

Platform:: 

Terra 


Generated for a request for 20-90 deg N, 0-180 deg E 




Spatial Completeness 

MODIS Aqua AOD Average Daily Spatial Coverage 

bv Region + Season 



This table and chart is Quality Evidence for the Spatial 
Completeness (Quality Property) of MODIS Aqua Dataset 


Giovanni-4: Next Generation 


• Emphasis on data exploration 

• Collaboration with end users... 



Aerosol Optical Depth 550 nm (Dark 
Target), MODIS-Aqua 


Exploring Data: Interactive Scatterplot 



2012-06-01 through 2012-06-06 

Latitude: 25.5 to 49.5 
Longitude: 100.5 to 99.5 



Aerosol Optical Depth 550 nm (Dark Target), 
MODIS-Terra 

• Seriesl 

— Regression Line Equation: y = 0.9206438257S150S7x + 0.09143647870212385 

Highcharts.com 
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Aerosol Optical Depth SSO nm (Dark 
Target), MODIS-Aqua 


Zoom in on scatterplot 





1.25 1.5 1.75 2 

Aerosol Optical Depth 550 nm (Dark Target), 
MODIS-Terra 



• Series 1 

— Regression Line Equation: y = 0.7507112587684994x + 0.3484881241472614 

J 

Highcharts.com 
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Aerosol Optical Depth 550 nm (Dark 
Target), MODIS-Aqua 


Examine a subregion 
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Aerosol Optical Depth 550 nm (Dark Target), 
MODIS-Terra 

• Seriesl 

— Regression Une Equation: y = 0.92064382S7515057x + 0.09143647870212385 

Highcharts.com 
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Current G3 Performance 




http:/ / gdatal.sci.gsfc.rasa.gov 



The date range you entered covers more than one 


small time interval (i.e., it is repeated daily, hourly, 
minutely, etc.}. This could result in a great many data 
being returned in the query (and could take more than 
a few minutes}. Are you sure you want to submit this 
large query? 



year, but at least one parameter in your query has a 



OK 


Elapsed Time (s) 



Initial G3-G4 Performance Tests 




Participatory Design: myGiovanni 








Participatory Design: myGiovanni 
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GES DISC 


visualize 
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Participatory Design: myGiovanni 










Participatory Design: myGiovanni 












Participatory Design: myGiovanni 













Participatory Design: myGiovanni 












Charge to the Informatics Community 



• Carry on: 

— Multi-sensor Data Analysis 

- Data Quality 

- Participatory design 


Charge to the Informatics Community 


• Carry on: 

— Multi-sensor 
Data Analysis 

- Data Quality 

- Participatory design 

• Build Bridges 

- Data and Science 

- Informatics and 
Earth Science 



Greg at QA4EO Meeting 


* v t 
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http://qa4eo.org/workshop_harwellll.html 
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Thank you 




